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ABSTRACT 

A  new  technique,  named  SHARP,  is  presented  for  the 
partitioning  of  VLSI  integrated  circuits.  SHARP  is  a 
hill-climbing  heuristic  that  is  designed  to  be  incorpo¬ 
rated  into  a  partitioning-based  placement  algorithm. 
Its  important  features  include  a  geometric  decompo¬ 
sition  of  the  layout  surface  into  a  '#'-shaped  region;  a 
multi-objective  function  that  more  accurately  repre¬ 
sents  wire  usage  than  the  standard  min-cut  criterion, 
and  extensive  use  of  Steiner  trees.  A  series  of  experi¬ 
ments  demonstrates  that  the  SHARP  technique  pro¬ 
duces  very  high  quality  partitions. 

INTRODUCTION 

The  physical  design  process  for  VLSI  circuits  is 
often  one  of  hierarchical  decomposition.  At  all  levels 
of  the  hierarchy,  an  important  design  step  is  partition¬ 
ing  the  atomic  circuit  elements  that  compose  the  func¬ 
tional  unit  into  a  physical  package.  The  physical 
package  is  realized  typically  as  a  collection  of  sub¬ 
packages  or  modules  that  are  chosen  such  that 
together  they  optimize  some  predetermined  figures 
of  merit.  The  principal  figures  of  merit  are  usually 
concerned  with  one  or  more  of  the  following  values: 
number  of  modules,  size  of  modules,  number  of 
external  connections  required  by  any  module,  system 
delay  [6,13]. 

Circuit  partitioning  research  has  concentrated 
primarily  on  the  min-cut  partitioning  problem  which 
divides  a  circuit  into  two  roughly  equal-sized  parti¬ 
tions  in  a  manner  that  minimizes  the  inter-module 
connections.  These  research  investigations  have  pro¬ 
duced  a  variety  of  circuit  element  migration  tech¬ 
niques  that  iteratively  transform  a  given  solution 
[5,6,7].  While  these  solution  methods  are  primarily 
greedy  heuristics,  they  all  use  hill-climbing  techniques 
to  varying  extents. 

Min-cut  partitioning  methods  have  proven  to  be 
quite  effective  for  their  traditional  application  of  cir¬ 
cuit  packaging,  where  a  package  is  characterized  by 
its  size  and  its  number  of  external  terminals.  This  suc¬ 


cess  led  researchers  to  apply  the  method  to  other 
physical  design  problem  —  most  notably  VLSI  circuit 
placement  [1,4,9,10]. 

The  goal  of  the  placement  step  is  to  optimally 
position  circuit  elements  onto  a  layout  surface.  The 
positioning  of  a  circuit  element  has  two  basic  compo¬ 
nents  —  one  is  to  determine  its  location,  and  the  other 
is  to  specify  its  orientation.  An  optimal  assignment  is 
typically  one  which  allows  the  interconnection  activ¬ 
ity  to  automatically  achieve  its  goals.  These  goals  are 
often  over-constrained  and  almost  always  include 
minimizing  the  total  wire  length  and  layout  surface. 
Although  placement  is  clearly  a  problem  of  at  least 
two  dimensions,  min-cut  partioning  methods  can 
produce  a  solution  in  the  following  manner. 

•  Apply  the  partitioning  method  to  construct 
two  partitions.  Elements  in  different  parti¬ 
tions  are  constrained  to  lie  in  different  halves 
of  the  package. 

•  The  algorithm  is  applied  recursively  and  sep¬ 
arately  to  the  two  partitions.  The  recursion 
terminates  when  no  partition  has  more  than 
one  circuit  element  in  it. 

The  technique  is  depicted  graphically  in  Figure  1. 

Placement  researchers  realized  that  the  above 
method  is  too  simple.  To  achieve  acceptable  solutions, 
the  sub-problems  cannot  be  dealt  with  in  isolation. 
For  example,  prediction  techniques  such  as  terminal 
propagation  [4]  and  in-place  partitioning  [9]  are  used 
to  help  place  a  given  circuit  element  by  considering 
how  its  nets  enter  its  partition  block.  With  such  tech¬ 
niques,  min-cut-based  placers  (MCP)  can  be  effective. 
This  was  demonstrated  by  Suaris  and  Kedem  [16]  in 
their  comparison  of  an  MCP  using  terminal  propaga¬ 
tion  with  the  state-of-the-art,  simulated  annealing- 
based  placer  TlMBERWOLF  [14].  For  benchmark  circuit 
Primary  1  of  the  ACM/IEEE  Physical  Design  Work¬ 
shop  [12],  the  MCP  produced  a  solution  whose  layout 
surface  and  wire  length  were  within  13%  of  TlMBER- 
WOLF's  with  a  running  time  speed-up  factor  of  over 
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Figure  !•  Min-cut  partitioning-based  placement. 


100.  However,  Suaris  and  Kedem  observed  that  the 
MOP'S  solution  quality  was  not  consistent,  and  that  as 
circuit  instances  grew  laiger,  solution  quality  deterio¬ 
rated.  For  example,  MOP'S  solution  quality  for  the 
larger  benchmark  circuit  Primary  2,  was  worse  than 
TlMBERWOLF's  by  over  20%.  Although  additional 
min-cut  partitioning  research  is  increasing  partition¬ 
ing  quality  [11,15],  the  fundamental  problem  remains 
that  a  min-cut  algorithm  is  one-dimensional. 

The  one-dimensionality  of  min-cut  partitioning 
led  Suaris  and  Kedem  [16,17]  to  develop  their  quadri- 
section  approach  that  simultaneously  partitions  a  cir¬ 
cuit  into  four  quadrants,  rather  than  the  traditional 
two  halves.  By  attaching  non-uniform  weights  to  hor¬ 
izontal,  vertical,  and  diagonal  crossings,  either  verti¬ 
cal  or  horizontal  cuts  can  be  favored.  Thus,  the 
quadrisection  technique  allows  some  routing  conges¬ 
tion  balancing  to  occur  with  respect  to  the  upper  and 
lower  quadrants  and  to  the  left  and  right  quadrants. 
Suaris  and  Kedem's  experiments  with  quadrisection 
indicate  that  the  method's  solution  quality  is  compet¬ 
itive  with  TiMBERWOLF,  while  running  in  only  a  tenth 
of  the  time.  Yet,  in  spite  of  their  generalization,  some 
problems  remain.  For  example,  simultaneous  conges¬ 
tion  balancing  is  limited  to  quadrants  in  different 
halves  although  the  preference  in  practice  is  to  bal¬ 
ance  cuts  on  opposite  sides  of  the  same  half  (e.g.,  a 
standard  cell  channel  has  uniform  height  in  most 
design  methodologies).  As  another  example,  the  esti¬ 
mate  of  the  routing  area  required  by  a  net  remains 
crude  as  the  coarseness  of  the  partitioning  allows 
only  straight-line,  single  bend,  and  horseshoe  connec¬ 
tions  (with  rotations)  to  be  considered. 

To  overcome  these  problems,  we  propose  a  new 
partitioning  method  that  is  more  strongly  influenced 
by  the  geometry  of  the  layout  surface.  It  is  tuned  for 
intra-package  connections  rather  than  inter-package 
connections.  The  method  is  named,  SHARP,  as  the  lay¬ 
out  circuit  surface  is  decomposed  geometrically  into 
nine  regions  in  a  manner  that  resembles  a  musical  ‘#\ 
This  is  demonstrated  graphically  in  Figure  2(a).  In 
this  figure,  the  nine  partition  blocks  S  =  {Si,...,S9}  are 
canonically  ordered.  In  Figure  2(b)),  the  twelve  inte¬ 
rior  SHARP  boundary  segments  C  -  (Cj,...,^}  are 
labeled  similarly. 
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Figure  3.  Net  block  decomposition  with  six  different 
Steiner  tree  forms. 

THE  SHARP-LOOKING  PHILOSOPHY 
The  SHARP  decomposition  was  selected  as  it  is  the 
smallest,  nontrivial,  symmetric  decomposition  that 
allows  contiguous  regions  of  the  circuit  surface  that 
share  similar  routing  features  and  problems  (e.g.  con¬ 
gestion)  to  be  processed  as  unit.  This  property 
ensures  that  all  its  computations  are  readily  tractable. 
For  example,  in  determining  the  preferred  intercon¬ 
nection  given  a  net's  block  decomposition,  every  min¬ 
imum-length  Steiner  tree  form  can  be  considered  for 
the  net.  There  are  on  average  less  than  five  such 
Steiner  forms  per  decomposition  and  no  net  decom¬ 
position  requires  the  consideration  of  more  than  192 
different  Steiner  tree  forms.  Similarly,  in  determining 
favorable  alternative  decompositions  for  a  net  after 
moving  one  or  more  of  its  circuit  elements  from  one 
block  to  another,  there  are  on  average  only  two  new 
Steiner  tree  forms  that  need  be  considered.  Also,  since 
the  total  number  of  minimum-length  Steiner  tree 
forms  is  less  than  three  thousand,  these  forms  can  be 
precomputed  once  and  used  via  a  hashing  or  an 
appropriate  indexing  scheme. 
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The  trees  given  in  Figure  3  are  the  six  possible 
minimum  length  Steiner  tree  forms  corresponding  to 
the  given  block  distribution  of  terminals. 

As  most  partitioning  algorithms  are  one-dimen¬ 
sional  in  nature,  their  optimization  function  consists 
of  a  single  criterion,  and  as  noted  above,  partitioning- 
based  placement  algorithms  use  the  min-cut  crite¬ 
rion.  However,  the  true  principal  figure  of  merit  for 
evaluating  placement  quality  is  layout  surface  size. 
Once  the  circuit  elements  have  been  chosen,  this 
reduces  to  minimizing  the  routing  region.  For  most 
design  methodologies,  minimizing  the  routing  region 
has  two  primary  components:  minimizing  total  wire 
length  and  minimizing  channel  height.  Therefore,  it  is 
these  two  criteria  that  SHARP  uses  to  evaluate  parti¬ 
tion  quality.  The  result  is  a  better  estimate  of  the 
expected  wire  usage.  It  can  be  demonstrated  experi¬ 
mentally  that  a  SHARP-based  placer  is  generally  less 
susceptible  than  an  MCP  to  decomposition  ordering. 

Through  use  of  a  congestion  map  based  on  C, 
SHARP  can  better  estimate  channel  height.  Depending 
upon  the  circuit  design  methodology  in  use,  SHARP 
can  be  configured  to  control  cuts  in  a  variety  of  ways 
and  combinations.  For  example,  it  can  control  conges¬ 
tion  by  limiting  the  number  of  cuts  across  any  one  line 
whether  horizontal  or  vertical,  and  it  can  also  balance 
the  number  of  cuts  that  span  different  lines  or  even 
parts  of  different  lines.  Thus  if  it  is  desired,  SHARP  can 
favor  balancing  jointly  or  independently  the  conges¬ 
tion  across  such  lines  (among  others)  Q  and  C2,  C6 
and  C7,  Cu  and  C12,  C3  and  C8,  C4  and  C9,  and  C5  and 
Cio- 

Just  as  Sharp's  optimization  function  is  more 
complete  than  the  min-cut  criterion,  so  is  the  SHARP 
solution  itself.  Besides  returning  an  assignment  of  cir¬ 
cuit  elements  to  partition  blocks  as  a  standard  parti¬ 
tioning  algorithms  does,  SHARP  also  returns  a 
suggested  Steiner  tree  form  for  each  net  to  achieve  the 
optimal  expected  use  of  the  layout  surface.  This  addi¬ 
tional  information  makes  it  easier  for  a  SHARP-based 
placer  to  incorporate  a  global  router. 

In  the  section  below,  we  describe  in  further  detail 
a  partitioning  algorithm  based  on  the  above  Sharp 
concepts. 

SHARP  PARTITIONING 
The  basic  SHARP  algorithm  is  given  in  Figure  4.  As 
shown  there,  the  algorithm  is  a  greedy  one  that  essen¬ 
tially  alternates  between  improving  the  two  wire 
usage  components.  We  found  that  this  alternation 
strengthened  Sharp's  hill-climbing  abilities. 

The  initial  partition  is  constructed  using  a  simpli¬ 
fied  clustering  algorithm  [2].  However,  we  are  also 


algorithm 

compute  Steiner  tree  forms 

construct  initial  partition 

for  each  net  u  do 

assign  to  u  one  of  its  minimum  length  Steiner 
trees 

end 

while  partition  quality  is  improving  do 

perform  net  length  minimizing  circuit  ele¬ 
ment  movements 

perform  congestion  reduction  through  alter¬ 
native  minimum  Steiner  tree  selection 
perform  congestion  reduction  through  circuit 
element  movements 

end 

perform  congestion  reduction  through  alterna¬ 
tive  Steiner  tree  selection 

end 


Figure  4.  Basic  partitioning  algorithm. 

considering  alternative  constructions  using  tech¬ 
niques  such  as  a  genetic  algorithm  [3].  The  initial 
Steiner  tree  form  is  selected  randomly  from  one  of 
minimum  length  forms. 

Net  length  minimizations  are  performed  itera¬ 
tively.  During  each  iteration,  the  circuit  element  clus¬ 
ter,  %  whose  inter-block  movement  induces  the 
greatest  reduction  in  wire  length  is  relocated  to  the 
desired  block.  The  circuit  elements  in  £  are  then  fro¬ 
zen  in  that  block  for  the  remainder  of  the  step.  In  addi¬ 
tion,  non-frozen  circuit  elements  that  share  a  net  with 
a  circuit  element  in  £  have  their  inter-block  prefer¬ 
ences  updated. 

During  the  next  two  steps,  the  congestion  map  is 
examined  to  see  if  better  balancing  can  be  achieved. 
In  the  first  of  these  two  steps,  alternative  minimum 
length  Steiner  tree  forms  are  considered  for  the  vari¬ 
ous  nets.  Since  no  module  movement  is  being  done 
here  and  since  only  minimal  length  Steiner  trees  are 
considered,  the  effect  on  the  wire  usage  is  limited  to 
improving  congestion  (i.e.,  there  is  no  increase  in  the 
wire  length  component).  As  in  the  net  length  minimi¬ 
zation  step,  a  priority  ordering  is  established  —  nets 
are  examined  in  an  ordering  based  on  the  amount  of 
possible  congestion  improvement. 

The  second  congestion  reducing  step  uses  circuit 
element  movement  to  improve  solution  quality.  As  in 
the  net  length  minimization  step,  the  circuit  elements 
are  examined  in  priority  order,  and  are  frozen  for  the 
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step  once  they  have  been  moved.  However,  in  this 
step  the  circuit  elements  are  selected  with  respect  to 
possible  congestion  improvement  rather  wire  length 
improvement.  Since  circuit  element  moves  are  being 
made  with  respect  to  congestion  improvement,  this 
**p  can  increase  total  wire  length.  Similarly,  the  net 
length  minimization  step  can  increase  the  total  con¬ 
gestion. 

During  both  circuit  element  movement  steps,  it 
may  be  the  case  that  the  currently  most  desirable  cir¬ 
cuit  element  move  would  cause  a  partition  block  to  be 
overloaded.  Such  overloading  is  initially  permitted, 
but  the  amount  of  overloading  is  reduced  with  each 
pass  of  the  loop.  As  a  further  hill-climbing  feature. 
Sharp  can  be  configured  to  use  multiple  priority 
queues  so  that  the  best  feasible  circuit  element  move 
is  performed.  It  can  also  be  configured  to  the  find  the 
best  feasible  pair  or  even  the  best  feasible  chain  of  cir¬ 
cuit  element  moves. 

The  final  step  of  the  algorithm  also  attempts  to 
improve  (reduce)  the  congestion.  Unlike  the  previous 
congestion  improvement  steps.  Sharp  does  not 
require  that  the  alternative  Steiner  tree  forms  be  of 
minimum  length.  Although  the  number  of  such  tree 
forms  increases,  the  number  remains  practical  and 
the  computation  cost  is  worth  the  increase  in  solution 
quality.  For  example,  on  average  there  are  less  than  5 
minimum  length  Steiner  tree  forms  and  approxi¬ 
mately  50  non-minimum  length  distinct  Steiner  tree 
forms  per  net  block  decomposition  with  a  maximum 
number  of  192  distinct  forms  per  decomposition. 

The  running  time  of  the  partitioning  algorithm  is 
dominated  by  the  cost  of  the  while  loop.  Since  this 
loop  only  iterates  several  times  in  practice,  the 
expected  running  time  of  the  algorithm  is  propor¬ 
tional  to  the  cost  of  a  single  pass.  While  it  is  true  that 
no  more  than  m  movements  can  be  made  in  either  of 
the  circuit  element  movement  steps,  where  m  is  the 
number  of  circuit  elements,  the  priority  of  a  circuit 
element  can  change  multiple  times.  Using  analysis 
similar  to  Fiducda  and  Mattheyses  [51,  we  can  dem¬ 
onstrate  that  the  total  number  of  priority  queue  oper¬ 
ations  is  on  the  order  of  p,  where  p  is  the  total  number 
of  terminal  pins.  Since  the  maximum  number  of  min¬ 
imum  length  Steiner  trees  per  net  block  decomposi¬ 
tion  is  independent  of  the  circuit  instance  (i.e.,  a 
constant),  the  total  work  performed  as  a  result  of  cir¬ 
cuit  dement  movement  or  alternative  Steiner  tree 
selection  also  remains  proportional  to  p.  Thus,  the 
running  time  of  a  loop  iteration  is  proportional  to  the 
factor  p  log  m,  since  priority  queue  manipulations 
(eg.,  insertions,  deletions)  are  readily  done  in  loga¬ 
rithmic  time. 


EXPERIMENTAL  RESULTS 
While  the  results  discussed  in  this  section  are  defi¬ 
nitely  encouraging  when  compared  to  standard  min- 
cut  partitioning  (SMC)  algorithms,  they  must  be 
viewed  as  preliminary,  SHARP  is  still  undergoing 
refinement  with  respect  to  its  hill-climbing  capabili¬ 
ties.  As  a  result  of  this  refinement,  we  expect  to 
achieve  significant  additional  improvements. 

It  must  be  noted  that  directly  comparing  a  Sharp 
decomposition  to  one  produced  by  an  SMC  using 
iterative  decomposition  is  not  strictly  fair  —  SMC's 
were  not  designed  to  produce  such  decompositions. 
This  SMC  inability  is  demonstrated  strikingly  in  our 
first  table.  However,  its  inability  should  not  be 
excused  —  placements  are  not  one-dimensional,  and 
the  tools  that  produce  placement  solutions  should 
reflect  this  characteristic.  Since  coarse  (global)  routes 
such  as  Sharp's  inter-block  Steiner  routing,  are  not 
produced  by  SMC's,  the  comparison  is  also  unfair  to 
SHARP,  as  one  of  its  important  outputs  is  ignored. 

SHARP  and  a  Fiduccia  and  Mattheyses-type  SMC 
[5]  were  both  run  on  benchmark  circuit  Primary  1  of 
the  ACM/IEEE  Physical  Design  Workshop  [12].  This 
instance  has  approximately  800  circuit  elements  and 
1000  nets.  The  tools  were  also  run  on  an  in-house  ran¬ 
dom  circuit  instance.  Random  1,  that  was  explicitly 
designed  to  be  hard  with  respect  to  decomposition  by 
using  techniques  similar  to  ones  proposed  by  Krish- 
namurthy  [8].  Random  1  has  84  circuit  elements  and 
153  nets.  The  results  reported  below  are  based  on  ten 
runs  of  each  tool  on  each  instance.  The  time  to  run  the 
tools  is  negligible. 

Table  1  compares  the  performance  of  SHARP  and 
the  SMC  on  Primary  1  with  respect  to  total  inter-block 
wire  length.  The  table  demonstrates  that  Sharp's 
wire  usage  for  both  the  average  and  worst  case  statis¬ 
tic  is  approximately  60%  of  SMC's  usage.  In  fact, 
Sharp's  worst  case  usage  is  approximately  66%  of 
SMC's  average  case  usage. 


Method 

Routing  Length 

Average 

Maximum 

SMC 

859 

899 

Sharp 

518 

572 

Table  1.  Comparing  wire  usage. 

Table  2  also  compares  the  performance  of  SHARP 
and  the  SMC.  However,  this  comparison  is  with 
respect  to  inter-block  connection  (channel)  usage  (i.e., 
height  and  imbalance).  SHARP'S  solution  quality  is 
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Method 

Channel  Congestion 

Height 
Avg  Max 

Imbalance 

Avg  Max 

SMC 

Sharp 

70  117 

43  71 

39  45 

18  29 

Table  2.  Comparing  channel  characteristics. 


superior  to  SMC's  for  both  components  of  channel 
usage.  For  all  four  provided  statistics.  Sharp's  solu¬ 
tion  quality  was  at  least  40%  better  than  SMC's  solu¬ 
tion  quality. 

SHARP'S  performance  on  pathological  instance 
Random  1  was  more  striking  —  it  consistently  used 
only  25-30%  of  the  routing  resources  required  by  the 
SMC. 

CURRENT  RESEARCH  ACTIVITY 
We  are  currently  developing  a  family  of  physical 
design  tools  that  make  full  use  of  Sharp's  properties. 
For  example,  we  are  designing  both  parallel  and 
sequential  placers  with  built-in  global  routers.  Other 
SHARP  research  is  pursuing  further  refinement  of  the 
multi-objective  function  and  the  development  of 
schedules  for  trading  off  the  wire  usage  components, 
as  well  as  the  amount  of  partition  overloading. 

SUMMARY 

A  new  physical  design  technique,  named  Sharp,  is 
presented  for  VLSI  geometric  partitioning.  SHARP  is  a 
multi-objective,  hill-climbing  heuristic  that  is 
designed  to  be  incorporated  into  a  partitioning-based 
placement  algorithm.  Experimental  analysis  indi¬ 
cates  that  SHARP  produces  very  high  quality  parti¬ 
tions  that  are  more  suitable  for  placement  than  those 
produced  by  conventional  min-cut  algorithms. 
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